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tion strategies in complicated Bayesian settings leads naturally to the use of 
importance sampling techniques to assess the divergence between full-data 
and case-deleted posteriors and to provide estimates under the case-deleted 
posteriors. However, the dependability of the importance sampling estima- 
tors depends critically on the variability of the case-deleted weights. We 
provide theoretical results concerning the assessment of the dependability 
of case-deleted importance sampling estimators in several Bayesian models. 
In particular, these results allow us to establish whether or not the esti- 
mators satisfy a central limit theorem. Because the conditions we derive 
are of a simple analytical nature, the assessment of the dependability of 
the estimators can be verified routinely before estimation is performed. We 
illustrate the use of the results in several examples. 
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1. Introduction 

Complex Bayesian models are fit with simulation techniques. A Monte Carlo 
method is used to generate a sample from the posterior distribution, and this 
sample is used to estimate many quantities, such as posterior means and vari- 
ances of parameters, posterior probabilities of events, predictive distributions 
of future cases, etc. For a complete analysis, one examines the data, looking 
for outliers and influential cases. One also considers information external to the 
model which suggests groups of cases that may depart from the model. When 
interesting groups of cases are found, they are dropped from the data set, and 
estimates are recomputed. The resulting case-deleted posterior distribution and 
the case-deleted estimates are of interest, as are the changes in the posterior and 
estimates. Substantial changes in posterior or estimates may lead to refinement 
of the model. Cross-validation also relies on case-deletion, as formalized by the 
conditional predictive ordinate (CPO) (see, for example, p. 47 and p. 284 of [4]). 

Case-deleted posterior distributions are examined through importance sam- 
pling. The large sample from the full posterior distribution is reweighted, as 
suggested for example in [21] and [23], to compute summaries with respect to 
the case-deleted posterior distribution. Examples of this and similar approaches 
are presented in [3, 15, 16, 25, 26] and [27]. As shown in [13] it is essential for 
the importance sampling weights to have finite variance. If the 2nd moment of 
the weights does not exist, typical estimators will not follow a n^^"^ asymptotic, 
nor will they follow a central limit theorem. 

It is shown in [19] that, for the case of a popular Bayesian linear model with 
conjugate priors, whether or not the weight function for a single case-deletion 
has finite 2nd moment depends on simple conditions involving the scale parame- 
ter of the prior distribution of the error variance, the leverage of the observation 
being deleted, its residual, and the total residual sum of squares. In this article, 
we expand upon the results of [19] in several directions. We first analyze the sit- 
uation of multiple case-deletions and provide necessary and sufficient conditions 
for the rth (r > 1) posterior moment of the weight function to be finite. This al- 
lows us to treat a group of observations coherently, thereby capturing synergistic 
effects of similar cases. We extend the results to much broader classes of prior 
distributions, so that we can handle nonconjugate as well as conjugate priors. 
This is accomplished by formally deffning classes of distributions that are thick 
or thin tailed with respect to the conjugate priors. This extension is coupled 
with two devices, bounding functions and adjustment of the prior, to allow us 
to establish a connection between a finite rth moment of the weight function 
and the finiteness of the 2nd moment for a variety of functions. The existence 
of two moments for these functions implies that a central limit theorem holds 
for an estimator. As in [19], the conditions arc on sample size, leverage and an 
adjusted residual sum of squares. 

In addition to the linear model, wc provide results for the Michaelis-Menton 
(MM) model. The MM model is nonlinear, but has the property that, condi- 
tional on one parameter, the mean structure is linear in the remaining param- 
eters. Making use of conditional linearity, we develop uniform versions of the 
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conditions for the linear model that ensure existence of the weight function's 7'th 
moment in the MM model. Many other models are conditionally linear (among 
them linear regression used in conjunction with Box-Cox transformations or 
linear regression along with Box-Tidwell transformations). We pursue further 
extensions of the linear model, deriving results for the logistic regression model. 

Our results have a very practical implication. They let us determine, quickly 
and analytically, whether central limit theorems hold for particular functionals. 
If central limit theorems hold, then we can pursue the strategy of fitting the 
model to the full data set and using importance sampling to estimate the func- 
tionals under case-deleted posteriors. If central limit theorems do not hold, we 
must alter our inferential strategy, either using more sophisticated importance 
sampling techniques (such as the importance link function technique introduced 
in [17]) or fitting the model for particular case-deleted data sets with separate 
Monte Carlo simulations. 

By providing conditions under which r moments of the ease-deleted weight 
function exist, our theorems go beyond the typical central limit theorem results 
that rely on the existence of second moments. This is important for two reasons. 
First, one may be interested in functionals where higher order moments of the 
case-deleted weight function come into play (see, for example, estimation of 
divergence in [26] and [27]). Second, the number of moments which exist for 
the deletion of particular cases can be used as a measure of their influence, 
thus allowing one to asses influence along a continuum. The connection between 
influence and moment conditions is elucidated by applying results presented in 
[7] and [8], which, for an arbitrary, non-negative random variable X, contain the 
definition of a quantity called the moment index of X. Denoting by W the weight 
function resulting from the deletion of a given set of cases, its moment index r* 
is the least upper bound on the number of moments which exist. This represents 
a quantitative summary of the limiting tail behavior of the case-deleted weight 
function in the sense that, as stated in [7] and [cS], r* = liminft^oo[logP(T4^ > 
t)]/[log(l/t)]. A larger moment index corresponds to a larger class of functions 
for which the central limit theorem exists. Practical illustration of these ideas 
are presented in Sections 4 and 6. 

This article is laid out as follows. Section 2 contains preliminary results and 
formal definitions of thick and thin tails. Section 3 provides conditions for the 
(non)existence of the rth moment of the case deletion weight function in the 
linear model. Section 4 gives conditions on moments' existence for the MM 
model, and Section 5 gives parallel results for the logistic regression model. Sec- 
tion 6 shows how the results can be used to establish central limit theorems. 
A summary of sufficient conditions on the weight function's moments to ensure 
a central limit theorem for several popular Bayesian measures of influence are 
presented in Table 2. The section also shows the results in action, investigating 
both measures of influence and their impact on model development in a mul- 
tiple linear regression setting. The final section contains concluding remarks. 
Technical details of proofs are left to the appendix. 
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2. Notation and preliminary results 

Each Bayesian model considered in this article depends on a finite dimen- 
sional parameter vector s ~ (si, . . . , Sk). Suppose that a set of observations 
y = (?/i,...,y„) is collected and let p{s) = p{s\y) denote the full posterior 
density for s. Let I denote the set of indices to be deleted from the analysis 
and let / be its cardinality. Let y\^j represent the n — I observations remaining 
after the indices in T are omitted with p\x{s) = P\i{s\y\x) denoting the cor- 
responding case-deleted posterior density. Furthermore, let q{s) = q{s,y) and 
Q\i{s) = 9('Sjy\i) denote functions computable at every point (s,y) and pro- 
portional to the joint prior densities (e.g., prior x data likelihood) of (s, y) and 
{s,y\i), respectively. 

Suppose that a sample zi, . . . , zm from p{s) is available. In a typical appli- 
cation this will be either an independent sample or a dependent sample from 
an ergodic Markov chain. We wish to construct an estimate of Fip^^[g{s)] = 
J gis)p\i{s) d{s), for some real valued function (7(5) such that / |ff(s)|p\i(s) ds < 
00. This can be done by computing a Monte Carlo sum in which the individ- 
ual elements g{zm) are reweighted. Typically, p{s) and p\i(s) are not available 
because their normalizing constants are unknown and only q{s) and q\x{s) are 
directly computable. In that case we can define the weight function w\i(s) = 
Q\xis) / q{s) and estimate the expectation by: 

K\A9{s)] = w\x{z„,)g{zm) / 5Z ^\2:(^™) • (2-1) 

\m=l / I \m=\ ) 

The denominator in Equation (2.1) divided by M estimates the ratio of the 
two unknown normalizing constants. Thus, if p(s) and p\x{s) SiTC available, 
■u;\i(s) can be replaced by w*j{s) — P\i(s)/p(s) in the numerator and the 
denominator can be replaced by M, resulting in the related estimator that 
we denote by Ep^^lg^s)]. In both cases, the resulting estimators are consistent 
under mild assumptions (see [13] for the case of i.i.d. samples and [24] for the 
case of samples from ergodic Markov chains). Throughout the article we refer 
to estimators of the form £^^^[17(5)] and £^^^[17(5)] as case-deleted importance 
sampling estimators. 

The prior distribution plays a large role in determining whether the Estimator 
(2.1) is asymptotically normal. To ensure asymptotic normality for F,p^j-[g{s)], 
we need both J w^j-{s)g'^{s)p{s) ds < 00 and / w'^j-(s)p{s) ds < 00. Finiteness 
of these integrals is unchanged by substitution of w*f (s) for w^j{s). (See Sec- 
tion 6 for further discussion of conditions for the asymptotic normality of both 
Ep^^[g{s)] and Ep^^[g{s)].) In many instances, a prior distribution with sharp 
enough tails will ensure that these integrals are finite while a flatter tailed prior 
will lead to infinite integrals. 

The upcoming lemma enables us to work easily with priors having different 
tails. In particular, it enables us to derive preliminary results for conjugate 
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prior distributions, and then to quickly extend the results to non-conjugate 
prior distributions. Use of the lemma is demonstrated in the examples. 
To set up the lemma, we first define the basic notation. Let 



for i = 0,1. The functions /, tt^ and h arc assumed to be non- negative. The 
constants Ci are assumed to be finite and positive. Let < 6 < i? < oo. 

Lemma 2.1. //, for all x, -niiix) j-Kxix) < B, then So ~ oo implies Si ~ oo. 
If, for all X, b < tto{x)/tti{x), then So < oo implies Si < oo. If, for all x, 
b < ttq (x)/7ri (x) < B, then So < oo if and only if Si < oo. 

A device that we have found useful is a formal description of thinner and 
thicker tailed distributions. Since the prior distributions that we consider here 
are all absolutely continuous with respect to Lebesgue measure on TZ'' , we use a 
simple definition that suffices for our purposes. We describe the result in terms 
of a distribution for a parameter since that is how we will use the result. 

Consider a parameter s G S. The parameter space S is taken to be TZ'^. Let T 
represent a set of distributions on s, all of which have densities with respect to 
Lebesgue measure. The following definition concerns the relationship between 
another distribution, g, and the set of distributions J-. 

Definition 2.1. The density g is said to be thick-tailed with respect to T 
if, for each f Cz J- and for each sequence st with ||st|| — > oo as t oo, 
limt^^g{st)/f{st) = oo. 

Definition 2.2. The density g is said to be thin-tailed with respect to T 
if, for each f € J- and for each sequence st with ||st|| — > oo as t oo, 
liuit^oo g{st)/ f{st) = 0. 

We note that these definitions capture the general notion of which distribu- 
tions are thicker or thinner tailed than others. For example, a t distribution 
will be thicker tailed than the class of normal distributions. A one-dimensional 
normal distribution will be thinner tailed than the Laplace distribution. A t dis- 
tribution with 5 degrees of freedom will be thicker tailed than a t distribution 
with 7 degrees of freedom, etc. We also note that a normal distribution with 
variance is thicker tailed than a normal distribution with variance ca^ if 
c < 1. 

3. A Bayesian linear model 

In [19] the author considers a standard specification of the Bayesian linear 
model and derives necessary and sufficient conditions for the variance of the 
case-deleted importance sampling weight function to be finite when a single ob- 
servation is omitted. Loosely, the conditions for a finite variance stated in [19] 
can be described as (a) small leverage for the deleted case, (b) large enough 
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sample size, and (c) small enough residual for the deleted case. In this section 
we extend the results of [19] in two different directions: we analyze the situation 
of multiple case-deletions and provide necessary and sufficient conditions for the 
rth (r > 1) posterior moment of the case-deleted weight function to be finite. 
Our conditions are also on leverage, sample size and residual. In addition, we 
extend the results to nonconjugate models by considering the tail behavior of 
the prior distribution. In Section 6, these results are used to establish central 
limit theorems for a broad class of importance sampling estimators. 
Let the n x 1 vector of observations Y be distributed as 

Y\e,a^ N {Xe,a^l) , (3.1) 

where I denotes the identity matrix and X denotes an n x fc design matrix of 
rank k. Assume that the variance a^, having an inverse gamma prior distribution 
with known positive parameters a and /3, is independent of the fc x 1 vector of 
regression parameters — {Oq, . . . , Ok~iY' having a proper prior density tti with 
full support TZ^ , i.e., 

0~7ri _L (T^ - JG (a, /3) . (3.2) 

To describe conditions under which moments of the case-deleted weight func- 
tion exist, we introduce some additional quantities. Let H — X{X'^X)~'^X^ 
and RSS = ij- — H)y denote the projection matrix and the residual sum of 
squares from the least squares fit of the full data set, respectively. The index 
set, I, consists of the indices of the / cases to be deleted. Given the index 
set X, let Yx be the J x 1 random vector of observations Yi, with i G X, 
and let Xj be the I x k submatrix of the / rows of X indexed by T. De- 
fine the leverage of set X to be the principal minor of H corresponding to 
T: Hj = X^{X^X)~^Xx, and define ej to be the / x 1 vector of the ele- 
ments indexed by I in the vector of the ordinary residuals e = (I — H)y, i.e., 
ex ^ yx - Xj{X'^X)~'^X'^y. Finally, for each r > 0, if the I x I matrix 
(I - rHx) is non-singular, let RSS*2:(r) RSS - ref (I - rHxy^ex. When 
J = 1, so that I ~ {«}, Hx ~ xj {X^ X)^^Xi is the leverage of ith obser- 
vation, say ha, = iji — xf{X'^X)~^X'^y is the residual of observation i, 
and RSS*,(r) = RSS - r ef/(l - rhu). When r = 1, RSSyx(?') is the residual 
sum of squares from the least squares fit of the case-deleted data set. Letting 
s = {9,a'^), the unnormalized importance sampling weight function resulting 
from the deletion of the / cases indexed by X is given by 

w\x{-s) = i<yy^' cxp {l/(2a2)(Fx - X^ef{Yx - Xj0)} . (3.3) 

This functional form of the weight results from ignoring normalizing constants 
not depending on the model parameters and from canceling the common factors 
in the numerator and the denominator represented by the prior and by the 
portion of the Gaussian likelihood which corresponds to the undeleted cases. 

For the Bayesian linear model specified by Equations (3.1) and (3.2) the 
following theorem holds. 
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Theorem 3.1. Let l^|0,cr^ - N{X9,(j'^I). Let Xi < ■ ■ ■ < Xj denote the 
eigenvalues of Hx and assume that A, =/= 1/r, for all i = I, . . . , I . 

(i) If the prior distribution follows specification (3.2), then the case-deleted 
weight function w\x(s) has a finite rth moment with respect to the full posterior 
p{s) if 

(a) A/ < 1/r and (b) n/2 + a > r I /2 and (c) RSS\j{r) > -2/ p. 
Conversely, the rth moment o/ti;\j(s) is infinite if 

{a')Xi>l/r or {b') n/2 + a < r I /2 or {c') RSS{j (r) < -2/ f3. 

(ii) If the noninformative prior Tr{0,a'^) ^ is used, then conditions (a) 
and (a') remain unchanged, and conditions (b), (c), (6') and (c') become: (b) 
n>rl + k, {b') n<rl + k, (c) RSS^i(r) > and (c') RSS^2:(r) < 0. 

Remark 3.1. Theorem 3.1 includes the problem investigated in [19] as a special 
case. There, the author takes r = 2 and specifies the prior distribution on {9, a^) 
as 0|E ~ 7V(0o,S), 0-2 ^ IG{a,P) and E ~ IW{vR,i^), with conditional 
independence at all stages of the model. The parameter 6q G TZ^ is a known 
mean vector, a and /3 are known positive constants, and IW {vR, v) is an inverse 
Wishart distribution with v a known integer greater than or equal to k and R 
a known k x k positive definite matrix. 

Remark 3.2. The statement of Theorem 3.1 involves the eigenvalues of the I x I 
matrix Hx- In typical applications, the cardinality / of the set of observations 
being deleted will be fairly small and the calculation of the eigenvalues can be 
accomplished quickly with standard software. For the illustrative examples pre- 
sented in the article, we computed all eigenvalues using the R function eigenO . 

Theorem 3.1 holds for any proper prior distribution on 6 having full support 
on TZ'^ , provided the parameters 9 and ct^ are independent and the prior for 
is IG{a,i3). This follows from the form of the likelihood function, which, for 
fixed (T^, is an exponential function with quadratic argument in 9, and, for fixed 
9, is the product of a power and an exponential function in 1/cr^. Recognizing a 
connection with the integral needed to normalize the kernel of an inverse gamma 
density suggests how to extend the results to the case of non-conjugate prior 
distributions. The next two corollaries make this extension, placing the focus 
on the tails of the prior distribution. 

The corollaries assume independence between 9 and tr^, and so we consider 
their tail behavior separately. Let tth denote a (proper) prior distribution on 
9, and let J^i be the family of all nondegenerate multivariate normal distri- 
butions on TZ'^. Corollary 3.2 distinguishes between priors that are thick-tailed 
with respect to !Fi and those that are not. Let 7ri2 denote a (proper) prior dis- 
tribution on (T^, and let J^2 be the family of all inverse gamma distributions, 
IG{a,f3), a > 0, P > 0. Exploiting the connection mentioned in the previous 
paragraph, the proof of Theorem 3.1 shows that conditions (a), (a'), (c) and 
(c') determine the integrability (or lack thereof) of a certain function of in a 
neighborhood of zero. For going to infinity, a suitable number of observations 
guarantees integrability. Thus, the corollaries focus on the tail for cr^ near 0, or 
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the tail for the precision, l/c^, tending to oo. A distribution. 7ri2 which is thick- 
tailed with respect to has the property that hmCT2_g 7ri2((T^)/7ro2(o'^) = oo, 
for all 7ro2 G ^2; a distribution that is thin-tailed with respect to satisfies 
limcr2^o'^i2(cr^)/7ro2(a'^) = 0, for all 7ro2 e J^2- 

Before proceeding, we summarize the notational conventions just introduced 
and the assumptions common to both corollaries. 

1. J- 1 denotes the family of all nondcgenerate multivariate normal distribu- 
tions on TZ'^. 

2. T2 denotes the family of all inverse gamma distributions. 

3. B and are assumed to be independent. 

4. TTii denotes a prior distribution for d having full support . 

5. 7ri2 denotes a prior distribution for cr^ such that J(o''^)("^''^'/^7ri2((T^) dcr^ < 
00. 

6. Al < • • • < A/ denote the eigenvalues of Hz assumed to satisfy \i ^ 1/r, 
for all i = 1, . . . , /. 

The first corollary deals with thick-tailed prior distributions 7ri2 on and covers 
the case of all proper prior distributions tth on 9. 

Corollary 3.1. Assume 1.-6. above and let 7ri2(cr^) be thick-tailed with respect 
to Ti. If Xi < 1/r and RSS*j(r) > 0, then the case-deleted weight function has 
finite rth moment with respect to the full posterior distribution. On the other 
hand, if Xi > 1/r or RSS*j(r) < 0, then the rth moment of the case-deleted 
weight function is infinite. 

The next corollary applies to thin-tailed distributions 7ri2(cr^). It provides 
only a sufficient condition if 7rii(0) is thin-tailed and necessary and sufficient 
conditions if ttuIO) is thick-tailed with respect to J-i. 

Corollary 3.2. Assume 1.-6. above and let 7ri2(CT^) be thin-tailed with respect 
to Ti. If Xi < 1/r, then the case-deleted weight function has finite rth moment 
with respect to the full posterior distribution. If-Kii{0) is thick-tailed with respect 
to Ti, then Xj > 1/r implies that the case-deleted weight function has infinite 
rth moment. 

If Xj > 1/r and both the prior tth on 9 and the prior 7ri2 on cr^ are thin- 
tailed, wc cannot draw any conclusions about the finitcncss of the full posterior 
rth moment of w\x{(^j cr^) as shown in the following example. 

Example 3.1. Consider the univariate regression model yj ~ N{9xj,a'^) with 
no intercept and with prior distribution on 0, 7rii(0) cx exp{— — ^o)^}- Sup- 
pose we observe a sample with ith leverage ha = l/2-\- 1/X]j=i ^j- Although 
hii > 1/2, if the prior distribution on cr^ is 7ri2(cr^) cx exp{— (cr^)~^ — a^}, the 
posterior second moment E{w'^^{9, (T^)|y) is finite. On the other hand, if the prior 

distribution on cr^ is 7r22(cr^) oc exp{— (ct^)~^/^ — tr^}, then E{w^-{9, cr^)|y) = 00. 
Both prior distributions 7ri2 and 7r22 are thin-tailed with respect to J-'2. 

Finally, consider the case of a prior distribution 7rii(0) having bounded sup- 
port. Arguing as in the proof of Corollary 3.2, it is easy to verify that the rth 
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moment of E{wl^j{0, a'^)\y), always exists if 7ri2((T^) is thin-tailed with re- 
spect to J-2. On the other hand, if 7ri2((T^) is either in J-2 or is thick-tailed with 
respect to T2, the finiteness of E{wl^j{6, (j'^)\y) depends essentially on the value 
of RSS*\j. 

More precisely, let M = ming^ ,upport(.,,) - TXxX^)e - 2{y^ X - 

ryjXj) 6. Then one can prove the following: 

-if 7ri2(cr2) = /G(a,/3), then RSS*i > -(2//3 + A/) and n/2 + a> rI/2 imply 
E{wl^^{e, cr2)|y) < 00, while RSS*i < -(2//3 -t- M) or n/2 + a< rI/2 implies 

-if 7ri2(cr^) is thick-tailed with respect to T2, then RSS*j(r) > —M 
and /(CT2)-("-'--r)/2;rj2(CT2) da^ < 00 imply E{wl^^{9,<j^)\y) < 00, 
while RSS:;x(r) < -M or /(o-2)-("-'--f)/2^^.2(cr2) dffS ^ ^ implies 

4. A nonlinear model 

To illustrate some of the issues that arise when the fitted model is nonlinear, we 
revisit a Bayesian analysis of the Puromicyn data presented in [17]. The data 
come from a biochemical reaction and are described in [-"i], p. 425. For a group 
of cells not treated with the drug Puromycin, there are 71 = 11 measurements 
of the initial velocity of a reaction, Vi, obtained when the concentration of the 
substrate was set at a given positive value, Ci. The observations are recorded 
in Table 1 and plotted in Figure 1. The Bayesian model fit in [17] assumes a 
non linear regression of velocity on concentration given by the Michaelis-Menten 
(MM) relation: 

E(yO - (r«c,)/(K + Q). (4.1) 

According to this relation, when the concentration of the substrate equals the 
Michaelis parameter, k, the velocity reaches half of its maximal value, m, which 
is also the limiting velocity as the concentration goes to infinity. 

Following [17], we model the n observations as independent realizations from 
normal distributions with means given by Equation (4.1) and common variance 
(7^. All three parameters m, k, and are constrained to be positive and their 

Table 1 

The Puromycin Data and Related Case-Deleted Quantities. The bottom row contains the 
moment index r* , i.e., the least upper bound on the value of r such that 
E{wr.{m, cr^, k))\v) < 00. 



Case No. i 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


Concentration 


0.02 


0.02 


0.06 


0.06 


0.11 


0.11 


0.22 


0.22 


0.56 


0.56 


1.1 


Velocity 


67 


51 


84 


86 


98 


115 


131 


124 


144 


158 


160 




1.97 


1.97 


1.96 


1.96 


1.94 


1.94 


1.87 


1.87 


1.34 


1.34 


-0.45 


moment index r* 


1.59 


2.79 


4.48 


5.17 


2.86 


5.19 


6.38 


5.26 


3.77 


2.81 


1.32 
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Fig 1. Scatterplot of the Puromycin Data Set. The curve represents a fit of the expected 
velocity based on the posterior means of m and k. 



prior distribution is specified as 7r(m, k, a^) = 7ri(m, cr^) tt2{k)j with 7ri(m, cr^) oc 
1/(7^ representing a noninformative prior density for (m, cr^) and 7r2 representing 
a proper prior density for k such that K7r2(K) d/t < oo. This requirement 
guarantees that the posterior is proper. 

The MM model is. conditional on k, a linear regression model with no inter- 
cept and covariate Xi{K) := Ci/{K + q), for « = 1, . . . , n. Thus, for fixed k, if the 
support of m were 7?.^, we could apply Theorem 3.1(ii). The case-deleted weight 
function is wyi(m, k, cr^) = (cr^)^/^ expj^-gj [fi — wa^i (k)]V(2(t2)}. Not mg 
that ha and e,; are continuous functions of k, we see that when The- 

orem 3.1(ii) indicates an infinite conditional rth moment at some value kq, it 
also indicates an infinite conditional moment in an open interval about kq. If the 
prior on k has full support, this interval receives positive posterior probability, 
and so the unconditional rth moment is infinite. The analog of Theorem 3.1 for 
the Michaelis-Menton model will impose conditions on leverage, sample size and 
residual. 

The unconditional rth moment may be infinite for a different reason: the 
finite conditional rth moments may integrate to infinity. Thus, the conditions 
will need to be strengthened to ensure a finite rth moment. To avoid this route 
to infinity, the conditions on leverage and residual are applied uniformly in 
K. Finally, an apparent infinite moment will sometimes be finite due to the 
restriction on the support of m. 

Define the conditional design matrix X{k). Proceeding as in Section 3, define 
the matrix Hj{k), and concentrate on its largest eigenvalue. The conditional 
leverage, l{2, n) = i'^) / 12^i=i ^ii'^)^ is the only non-zero eigenvalue. The 

condition on the residual can be expressed in terms of simpler functions which 
will prove useful later. Define r, k) = J2ifx^i{^) ~ ~ 1) X^iei 
B{I,r,K) =Y.i(iMk)vt- ir-l)Y.^ex^'^il^)^'^^ and C(X,r) = Y.t<^i - (r - 
1)E iei^i • The adjusted, conditional residual sum of squares is RSS*j(r, k) — 
C{I, r) — B^{T, r, k) /A{T, r, k). The set of zeroes of A{T, r, k), with T and r held 
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fixed, contains at most 2(n — 1) points, a set of Lebesgue measure 0, and so we 
need not worry about the apparent division by 0. One last quantity is needed to 
fiandle tlie partial support of m. Define g{J,K) = X^iei ^»('*)^»/ SiLi ^^^(k)?;^, 
the product of covariate and response summed over the deleted cases divided 
by the same quantity summed over all cases. 

The results on the finiteness of i?(wJ(TO, cr^, k)|d) are summarized in the 
following theorem. 

Theorem 4.1. Let tt2 be a proper prior distribution on k such that 

Jp K 7r2(K) dK < oo. 

Suppose that 

(a) there exists a measurable set Af such that 7r2 (Af) — and 
sup^GAA" 1^) < lA 

and 

(6) n>rl + l. 
If, in addition, 

(c) infKeA^c{RSS;^j(r, k)} > 

or 

(d) C{2, r) > and inf^eAA"^ g{2, k) > 1/r 

holds, then the case-deleted weight function w\j{m, k, cr^) has finite rth moment 
with respect to the full posterior p{m, k, cr^). 
On the other hand, either of the conditions 

(e) A{I,r, K)m?' — 2B(2,r, K)m + C{2,r) < for all {m, k) in some non- 
negligible set 

or 

if) n<rl + 1 

is sufficient for the rth moment of the importance sampling weight function to 
be infinite. 

Remark 4.1. The sufficient conditions of Theorem 4.1 are essentially necessary 
for the rth posterior moment of the case-deleted weight function to be finite. If 
there exists a non-negligible set of values of k such that 
(a') l{I,K)>l/r 

or 

(c') RSS*2:(r, k)} < and (d') C{I, r) < or g{I, k) < 1/r, 
then condition (e) is satisfied, and vice versa. This is because 1{T, k) > 1/r if and 
only if A{I, r, k) < 0, B^il, r, k) — C(X, r)A{T, r, k) is the discriminant of the 
quadratic equation r, K)m^ — 2i?(I, r, k)to -I- C(X, r) = 0, and g{X,K) < 1/r 
if and only if B{T, r, k) > 0. 

Remark 4.2. For a general r > 1, upper bounds for the leverages 1(2, k) and 
lower bounds for the functions g{T, k) and for the marginal residual sums of 
squares RSS*j(r, n) are hard to derive analytically, but numerical verification of 
the conditions of Theorem 4.1 is rather simple. In fact, l(X, k) ^ l/r if and only 
if r, k) 7^ and g{T,K) ^ 1/r if and only if B{T,r,K) ^ 0. Moreover, for 
K > 0, A{T,r,K), B{T,r,K) and C{T,r)A{T,r, k) ~ [B{T,r, k)]'^ are continuous 
functions that approach zero as k goes to infinity and that can only have a finite 
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number of cxtrcma. The latter can be found among the real positive roots of 
polynomials of degree 271 — 3, n — 2 and 2?! — 3 respectively. 

Remark 4.3. The strategy applied to the MM model applies to an array of 
models that are, conditional on some set of parameters, linear. We impose the 
leverage and residual conditions uniformly across the parameters that render the 
model nonlinear. Important classes of models are linear regression models that 
allow for Box-Cox transformation of the response variable and/or Box-Tidwell 
transformation of the explanatory variables. 

The authors of [17] specified a t distribution on 3 degrees of freedom restricted 
to [0, +00) as a prior 7r2 for the parameter k and fit the model to the Puromycin 
data using the program BUGS (see [22]). (Due to some technical restrictions 
of BUGS, they had to use approximations for some of the prior specifications.) 
They considered deletion of single cases and computed the corresponding case- 
deleted weight functions. They reported detailed estimation results based on 
the deletion of case 1, an observation that produces highly variable realized 
weight functions, and illustrated how a transformation based approach (the 
Importance Link Function method) can effectively reduce the variability of the 
weight functions and lead to improved estimation. 

We discuss the implications of the results developed in this section on the 
analysis presented in [17]. We consider, as was done in [17], deletion of sin- 
gle observations and focus on the case r = 2, so that the sample size condi- 
tion (&) of Theorem 4.1 is satisfied for all cases. An examination of the lever- 
age condition (a) shows that observation 11 has large leverage (for k = 2, 
1{X,k) = 0.5065 > 1/r = 1/2), and so by Remark 4.1, the posterior variance 
of the case-deleted weight function for case 11 is infinite. All remaining cases 
have leverages bounded away from and above strictly by 1/2, and so satisfy 
condition (a). 

Turning to the residual conditions (c) and (d), we find that all observations 
other than 1 and 11 satisfy condition (c), thus ensuring finite variances for their 
case-deleted weight function. For observation 1, the adjusted residual sum of 
squares is negative for values of k near 0.08, violating condition (c). Condi- 
tion (d) is also violated since sup^>Q .g(l, k) = 0.05501 < 1/2. Consequently, 
Theorem 4.1 implies that the case-deleted weight function for observation 1 has 
infinite variance. 

In addition to r = 2, we can examine other moments of the case-deleted 
weight functions. Table 1 displays, for every case-deletion i, the moment in- 
dex r*, i.e., the least upper bound on the value of r for which the rth mo- 
ment exists (see [7] and [8]). If the influence of the ith observation on the 
posterior distribution p(m,cr^,K) is assessed by the divergence measure be- 
tween the case-deleted and full posteriors: = /[p\z("t-j '*)/p("T': o"^: ~ 
l]^p(m, (T^, k) dm da^ dn, then, as suggested in [2G] and [27], we can estimate 
X^ by means of the Monte Carlo sum appearing in Table 2. As indicated in Sec- 
tion 6, this estimator is asymptotically normal if E{w^^{s)\y) < 00. According 
to the values of r displayed in Table 1, a central limit theorem holds only for 
the estimators of corresponding to observations 3, 4, 6, 7, and 8. 



Table 2 

Influence Measures. This table presents a selection of influence measures and sufficient conditions for their estimators to follow a central limit 
theorem (see Section 6). KL represents Kullback- Liebler divergence, LI is integrated Li loss, L2 is integrated L2 loss, Al is change in first 
moment of a parameter 9, A2 is change in second moment of a parameter 9, Hel is Hellinger distance, ChSq is chi-square distance, CPO is the 
Conditional Predictive Ordinate, and Bdd is a bounded function. As a shorthand for the notation introduced in Section 2, a subscript m means 
that a function is evaluated at Zm (e.g., Wm = u]{zm), etc.). The symbol Lx{s) represents the likelihood of the observations in T evaluated at the 
point s, L\^x represents the likelihood of the observations not in set X, and w represents the prior density. The expression 2 + 5 in the table means 

that it is sufficient, for some 5 > 0, that 2 + 5 moments exist. R = y^^^_^ Wm/M . C is an estimator of C = J q{s) ds. There are many estimators 
of C , with some based on a different simulation than that used to fit the model. In lines 2 and 3 of the table, we assume that C is sufficiently well 
behaved that it does not prevent the estimators from following central limit theorems. In line 8 of the table, we presume that w{s) = 1/Lx(s). 

Meas Estimand Estimator Mom's Adjmnt Adj-Mom's 

KL Jlog(-£^)p^^(s)ds -i?-i^^'^j«;,„logK„)/Af-log(ij) 2 + 5 n.a. n.a. 

LI J\p\x{s)-p{s)\p\x{s)ds C'-ii?-i^^'^jg™t«,„|R-i«;„-l|/M 2 n^L^^^ 2 

L2 J{p\x{s)-p{s))^P\x{s)ds C-^R-'J2"^llliR~'^^r^~lfwm/M 2 TT^L^^ 4 

Al J 9p\xis) ds - J 9p{s) ds ^^^^^ e™(ii;"iui„j - 1)/Af 2 6*2 2 

A2 J9^p\x{s)ds~ J9^p{s)ds Y.^^i'^^niR-^^m ~ 1)/M 2 9* 2 

Hel J iy/^] - ^P\i(s))2 ds 2 - ■2\/A^J2tl=i V^/M 

ChSq /(^ - ds E"=l(A-'»- - l)VAf 

CPO ^ Lx{s)p\xi.s) ds 2 

! g{s)p\l{s)ds J2^^if^~^9mw^/M 2 



2 n.a. n.a. 

4 n.a. n.a. 
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5. Bayesian logistic regression 

We now switch our focus to generalized linear models, concentrating on the 
study of a logistic regression model. Assume that, for each of n subjects, we have 
available a fc x 1 vector of covariate information, Xi, and we observe a 0-1 out- 
come, Yi. Suppose that the Yi are independently distributed as Bernoulli random 
variables taking on value 1 with probability pi = exjp{jJ^Xi}/[l + cxp{f3^ Xi}]. 
The case-deleted weight function is proportional to 

The following theorem covers prior distributions with the exponential tails that 
match the logistic regression likelihood. Subsequent corollaries cover thinner and 
thicker tailed prior distributions. 

Theorem 5.1. Let the data follow the logistic regression model just described, 
and assume that we have a prior distribution for (3 with density proportional to 
7r(/3) ~ exp{— e|/3'^|l}, where e > is given and |/3^|1 X]t=o Define 



- X x^HfS'^x, > 0) + (r - 1) X xj{f3'^x, > 0) U e|/3^|l. 

i^i iei I 

If, for all vectors f3 such that |/3"^|1 ~ 1, h{j3,r,€) < 0, then the case-deleted 
weight function w\x{P) has finite rth moment with respect to the full posterior 
p{(3). If, for some vector (3 such that |/3"^|1 = 1, h{j3,r,e) > 0, then the case- 
deleted weight function has infinite rth moment. 

The theorem can be applied to prior distributions proportional to 
exp{— |/3"^|e}, where e is a vector of positive numbers. In this instance, a rcscal- 
ing of the covariates to obtain a prior with a single real e results in the type of 
prior for which the theorem is stated. 

The theorem may be strengthened somewhat by explicitly considering the 
case of max^ |^r|_|^_^ /i(/3, r, e) = 0, although the statement of precise condi- 
tions under which the case-deleted rth moment is infinite becomes messy. The 
conditions in Theorem 5.1 are easy to check since the maximum of /i(/3,r, e) 
may be found via linear programming methods. 

As in the case of the linear model, we will investigate the rth moment of 
the case-deleted weight function under thick-tailed and thin-tailed prior dis- 
tributions. The main tool for the proofs is, once again. Lemma 2.1. The first 
corollary deals with thick-tailed distributions. 

Corollary 5.1. Let the prior distribution on (3 have thick tails with respect to 



the class of distributions T = {7r(/3) : 7r(/9) = cexp(— e|/3"'"|l) and e > 0}. Th 



en. 
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if h{l3,r,0) < for all (3 such that |/3"^|1 = 1, the case-deleted weight function 
has finite rth moment (r > 0) with respect to the full posterior p{f3). If for 
some vector (3 such that |/3"^|1 = 1, h{f3,r,0) > 0, then the case-deleted weight 
function has infinite rth moment. 

The next corollary applies to thin-tailed distributions. 

Corollary 5.2. Let the prior distribution on (3 have thin tails with respect to 
the class of distributions T = {7r(/3) : 7r(/3) = cexp(— e 1/3' |1) ande> 0}. Then 
the case-deleted weight function has finite rth moment with respect to the full 
posterior p{j3) for all r > 0. 

5.1. Applying the corollaries 

The preceding corollaries enable us to determine quickly whether the case- 
deleted weight function has finite or infinite rth moment. Consider an arbitrary 
logistic regression model where the prior distribution on (3 is taken to be the 
normal distribution with mean /3g and variance S, with S of full rank. This 
distribution is thin-tailed with respect to the family of prior distributions used 
in Theorem 5.1. To verify this, write the ratio of priors, with g representing the 
normal prior density and / representing the prior density under a member of 
the exponential-tailed class: 

g{f3) (2^)-^/^|S]|-V^cxp(-(/3-/3o)^S]-H/3-/3o)/2) 

cexp(-6|/3^|l) 

< (2^)-^/2|I]|-V2c-i exp(-l/(2Ai)(/3 - f3^f {(3 - (3,) + e|/3^|l) 

fe k 

= (2^)-^/2|I]rV2c-i exp(-l/(2Ai) - + E lAI^)' 

1=1 1=1 

where Ai is the largest eigenvalue of E. Applying Corollary 5.2 with a normal 
prior distribution, we find that all positive moments of the case-deleted weight 
function are finite. This result holds, even if all of the cases are deleted. 

Suppose instead that the prior distribution on f3 is taken to be a multivariate 
t distribution with v degrees of freedom, location vector /3q and scale matrix 
S, with S of full rank. This t distribution is thick-tailed with respect to the 
family of prior distributions used in Theorem 5.1. A formal verification of this 
follows from an examination of the ratio of prior density functions. To establish 
finiteness or infiniteness of the case-deleted moments, use Theorem 5.1 with 
e = 0. 

We note that Theorem 5.1 can be of help in establishing whether or not 
the rth moment of the case-deleted weight function will be infinite, even when 
the prior distribution is improper. If the prior density for (3 is uniform on TZ'', 
for example, we merely apply the theorem with e = 0. The conclusion of a 
finite case-deleted rth moment is conditional upon the propriety of the poste- 
rior distribution. This propriety is not guaranteed, as use of the uniform prior 
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distribution may lead to an improper posterior distribution (see [12] and [18]). 
However, since the weight function w\x{(3) in Equation (5.1) always exceeds 
one, if the first moment of the case-deleted weight function is finite, so is the 
normalizing constant for the posterior: the posterior distribution is proper if any 
case-deleted weight function has finite first moment. 

6. Central limit theorems 

The previous sections provide results that enable us to calculate the number 
of moments which exist for the case-deleted weight function. The results ap- 
ply to classes of prior distributions, and so can be quickly used to establish 
asymptotic normality of the importance sampling estimator Ep,^^[(7(s)] given in 

Equation (2.1) and of the related estimator Y,j,^^[g{s)]. In this section, we indi- 
cate how these results apply to a variety of measures of case influence. We also 
present two techniques which are generally useful for applying the results. 

Central limit theorems (CLTs) for importance sampling estimators when the 
parameter vectors s are generated as i.i.d. samples or arise from a uniformly 
ergodic Markov chain, are described in [13] and [24], respectively. Under either 
source for the sample, the estimator Ep^^[(7(s)] is asymptotically normal if and 
only if 



Sufficient conditions for 'Eip^^[g{s)\ to be asymptotically normal are that condi- 
tion (6.1) holds and that 



These conditions arc explicitly presented for i.i.d. samples in [13]. A slight techni- 
cal extension of the CLT in [_ i] helps to establish the result for ergodic samples. 
The extension consists of an application of the Cramer- Wold device to establish 
the joint asymptotic normality of the estimator of the normalizing constant for 
the weight function and of an estimator proportional to Ep^^[(7(s)], followed by 
an application of the delta method (e.g., see [9]). 

The first technique for establishing a CLT recognizes that the (?^(s) term in 
the integral in condition (6.1) can be grouped with p{s\y), yielding, say, p*{s\y). 
The quantity p*{s\y) is the formal posterior distribution for s given the data, 
provided that it is integrablc. It corresponds to a proper Baycsian analysis with 
g^(s)-adjusted prior distribution proportional to g^(s)7r(s), provided that < 
J (7^(s)7r(s) ds < oo. We note that this integral is against the prior distribution, 
and so is typically easy to evaluate. 

To facilitate application of the theorems we wish to preserve full support of 
the function-adjusted prior distribution. To check the asymptotic normality of 
E [17(5)] we need only verify condition (6.1), provided 17(5) is never equal to 




(6.1) 




(6.2) 
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zero so that the g^(s)-adjusted prior has full support. In all other cases we act as 
if the prior distribution had density (1 + g^{s))TT{s). This preserves full support 
of the function-adjusted prior distribution in case g{s) is not always different 
from zero. This also allows us to verify at once conditions (6.1) and (6.2) when 
we wish to determine if F,p^j-[g{s)] is asymptotically normal. 

The second technique that we find useful is to establish the finiteness of inte- 
grals in case-deleted posteriors for a bounding function which then implies finite- 
ness for interesting classes of functions. The relation log^(a;) < (Ce -I- a;"'^ -I- x'^)'^ 
for some constant Ce and all a; > connects moments of the case-deletion 
weight function to finiteness of integrals for several influence measures. The 
moment generating function is also a useful bounding function. Hence, we con- 
sider 17(5) = exp(s"^t) for all t in some open neighborhood of 0, say, U. If 
/ 'w'^j{s) expi^s'^ t)p{s\y) ds < 00, for all t G U, then, condition (6.1) is satisfied 
for any polynomial in s"^ • • • s^*" and any constant. We note that condition (6.1) 
implies condition (6.2) and the CLT applies to the importance sampling esti- 
mators of any mixed and marginal moment of s. 

Formal Bayesian techniques that describe the influence of a set of cases on 
an analysis focus on a one-dimensional summary of the difference between the 
case-deleted posterior distribution and the full posterior distribution. Bayesian 
measures of model fit focus on case-deleted measures of predictive accuracy and 
cross-validation. A plethora of summaries exist. In this subsection, we show how 
our results can be used to verify that a CLT holds for the summaries estimated 
on the basis of a Monte Carlo sample. We illustrate this point with a discussion 
at the end of Example 6.1 concerning estimation of the conditional predictive 
ordinate (CPO). 

This approach can be applied to many Bayesian case influence measures. 
Table 2 contains a summary of results. Each row of the table corresponds to a 
measure of influence. The measure is given under the column headed Estimand, 
and a formula for estimation is given under the heading Estimator. The last 
three columns present sufficient conditions for the estimator to follow a CLT. 
The column headed Mom's gives a number of moments of the case-deleted weight 
function; the column headed Adjmnt presents the function used to adjust the 
prior distribution, if needed, and the column headed Adj-Mom's gives a number 
of moments of the case-deleted weight function against the function-adjusted 
prior distribution. If the given numbers of moments and adjusted moments both 
exist, then a CLT holds for the estimator. 

6. 1 . Examples 

Example 6.1. This example illustrates the practical use of the results pre- 
sented in Section 3. We fit a linear model to data assembled by the authors of 
[20] to investigate growth rates across mammalian species. Gestational time is 
known to be an important factor in determining growth rate. The data set con- 
tains 96 entries with complete information on growth rates and possibly related 
covariates for mammalian species. There is one marsupial that we excluded 
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from our analysis. Three of the remaining species exhibit delayed implanta- 
tion, a phenomenon by which the blastocyst, after reaching the uterus, remains 
dormant and unattached to the uterine lining for an extended period of time. 
An examination of the covariate gestation time (in days) led us to conclude 
that the recorded gestational time for the grizzly and polar bears-ursus arctos 
and thalarctos maritimus-included the preimplantation time while the recorded 
gestational time for the nine-banded armadillo-dasypus novemcinctus-did not. 
This last gestational time was adjusted to include preimplantation. After this 
adjustment, the recorded gestation time for each species included in the analysis 
covers the time from egg fertilization to birth. 

The response variable is a species' advancement, defined as the ratio of neona- 
tal to adult body weight. We built a linear model including an intercept and 
three covariates: the natural logarithms of gestation time, litter size, and adult 
body weight (centering all three covariates around their respective means). The 
least squares fit of this model yields a multiple i?-square of 0.4344. Based on 
a Bayesian analysis with noninformative prior distributions for the model pa- 
rameters, the 95% highest posterior density (hpd) intervals for the coefficients 
for log litter size and log body weight include only negative values while the 
95% hpd interval for the coefficient for log gestation time includes only positive 
values. This indicates that heavier species, species with larger litter sizes, and 
species with shorter gestation times give birth to relatively immature offspring. 

We use the theoretical results of the preceding sections for three purposes: 
we examine the influence of a preselected group of cases on inference, we screen 
all groups of cases of a certain size for their influence, and we verify the stability 
of cross-validatory estimators of summary measures. First, consider the three 
species with delayed implantation. We interpret the moment index rj, i.e., the 
cut-off value for the existence or non-existence of the rth moment of the case 
deletion weights (see, [7] and [8]), as a measure of influence of the cases being 
excluded. This cut-off value is given by the minimum of the cut-off value r* ^ 
between the leverage conditions (a) and (a') and the cut-off value r*j between 
the distance conditions (c) and (c'). Dropping the three species leads to the 
values r* X = 4.74 and r* j = 2.93. Thus = 2.93 for this set of species. This 
number is small, suggesting that this group of cases is influential. A glance at 
Table 2 shows us that a central limit theorem will not hold for the chi-square 
distance, but that it will hold for the other measures listed in the table. 

As with traditional measures of influence, we consider where our set of obser- 
vations falls on the measures r* j- and r* j with respect to other sets of similar 
size. We scanned all triples of species, computing cut-offs for each triple. Order- 
ing the triples of dropped species according to their increasing values of r* j, 
we found that the nine-banded armadillo belongs to 99 of the top 100 triples 
(all but the 31st), while the grizzly bear belongs to 2 of the top 100 triples (the 
18th and the 99th), and the polar bear belongs to one of the top 100 triples (the 
38th). Our three species in combination rank 1343rd out of the 138415 triples, 
with an r* j value of 4.74. Ordering the triples of dropped species according to 
their increasing values of r* j, we find that both bear species belong to each 
of the top 93 triples and that the grizzly bear belongs to each of the top 167 
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triples. Dropping all three species with delayed implantation at once yields the 
6th smallest value for r* j. From this comparative analysis, we conclude that 
the three species with delayed implantation may be influential, the nine-banded 
armadillo due mainly to its leverage and the two bear species due mainly to 
their outlyingness. This set of three species stands out, as there is a common 
underlying factor that may differentiate them from the other species. 

Pursuing the potential influence of our triple of cases, we examine whether 
inclusion of the dormant period in the total gestation time affects the conclusions 
that we draw from the model. To answer this question we adjusted the gestation 
times of these species to account only for the period of actual development and 
reconsidered the linear model described above. The least squares fit now yields a 
multiple i?-square of 0.5267. The 95% hpd interval for the coefficient for log litter 
size now contains 0, suggesting possible simplification of the model, although 
the qualitative interpretations of the impact of species weight, litter size, and 
gestation time remain the same. 

Repeating the earlier exercise of dropping triples of cases, we find that the 
leverage of the nine-banded armadillo is a little diminished, as it now enters only 
in 7 of the top 100 triples for r* j. The grizzly bear has a little more leverage, as 
it now belongs to 4 of the top 100 triples (the 7th, 10th, 17th and 65th), while 
the polar bear belongs to just one of the top 100 triples (the 98th). The three 
species in combination rank 1916th with an r'^j value of 5.24. The smallest 
value of r*2, which now equals 3.65, is attained when the three species papio 
papio, ursus arctos, and thalarctos maritimus are dropped. Both bear species 
belong to each of the top 22 triples, the grizzly bear belongs to 96 of the top 
100 triples and the polar bear belongs to 97 of the top 100 triples. Dropping all 
three species with delayed implantation at once yields the 23rd smallest value 
for r*^j. 

Thus, the three species with delayed implantation are still influential when 
the model is fit to the adjusted gestation times, although the extent of their 
influence is slightly diminished. According to both analyses, the two bear species 
are highly influential, due mainly to their large residuals. This not only confirms 
the well known fact that bears have an unusually small advancement but also 
reveals that the dormant gestation period by itself cannot account for it. Quoting 
from a January 27, 2004 New York Times article (see [1]): 

Polar bears share with all bears an extreme disparity between the size of their 
mother, in the quarter-ton range, and that of a newborn cub — about a pound. 
"It's dramatic trait in the bear family," Dr. Peatkau said. "They are off the chart 
among placental mammals, and closer to marsupials like the kangaroo." 

Model fit is commonly assessed via A:-fold cross validation. The data are 
partitioned at random into k subsets of approximately equal sizes and each of 
the k subsets is used in turn as a test set with the union of the remaining fc — 1 
subsets serving as a training set. The model is fit to the data in the training 
sets and its predictions are compared to the actual values of the observations 
in the test sets by computing some measure of predictive ability averaged over 
the k sets of predictions. In a Bayesian analysis, CPO provides a measure of 
overall predictive ability. The cross- validated CPO can be estimated with draws 
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from the full posterior and importance sampling weights. However, as noted in 
Table 2, for a given partition, the central limit theorem will not hold if drops 
below 2 when any of the k subsets of observations is excluded. 

To investigate how often the central limit theorem breaks down for CPO, we 
considered the case of 5-fold cross validation for the model fit to the data used 
in the first analysis and simulated 10,000 random partitions of the data into 5 
subsets of size 19 each. For each split we computed five values of rj. Out of the 
total 10,000 simulated partitions, there were 658 partitions where Tj dropped 
below 2 for exactly one of the five case deletions and there was one partition 
where it dropped below 2 for two of the the five case deletions. The value of 
never dropped below 2 for more than two of the five case deletions. 

If it is established that, for a particular partition, no central limit theorem 
holds for importance sampling estimation of CPO, then the analyst must turn to 
other methods of estimation. For example, sampling from a mixture distribution 
with components given by the full posterior and by the case deleted posteriors 
conditional on those subsets for which Vj <2 ensures the existence of a central 
limit theorem for the estimate of CPO. 

Example 6.2. The authors of [11] in their infiuential paper on Bayesian model 
selection/model averaging put a prior distribution over a collection of Bayesian 
linear models. There have been a host of extensions of their model, most of 
which are amenable to the treatment below. Formally, we describe a prior dis- 
tribution having the form of Equation (3.2). The likelihood for the model follows 
Equation (3.1). The prior distribution on the error variance is ~ IG{a,(3). 
The prior distribution on the regression coefficients is described in two stages. 
At the first stage, there is an indicator vector of whether a regressor, 9j, "ap- 
pears in" the model. The indicators are independent Bernoulli(pj) variates. If 
the regressor does not, then the conditional prior distribution on 6j is N{Q,t'^) 
with small r; if the regressor does, then the conditional prior distribution on 
9j is N{Q,ct'^), with large c > 1. Marginalizing pj, the prior distribution on 
an individual regressor is 9j ^ (1 — pj)N{0,T^) + pjN{0, ct-^). The resulting 
prior distribution remains absolutely continuous with respect to Lebesgue mea- 
sure while effectively allowing regressors to be included in or excluded from the 
model. 

The regression analysis is used to estimate the regression coefficients and the 
associated posterior expected loss. Pursuing a decision theoretic approach, we 
ask when the case-deleted importance sampling estimators follow CLTs. We use 
the standard sum-of-squared error loss, so that L{0, a) = X]j=o (^j ~ ^j)"^- The 
Bayes action, a, is the posterior mean vector. Here, we focus on the posterior 
expected loss. The posterior expected loss is E[L{6,a)\y] = J2jZo^^'''i^j\y)' 
and then, for the asymptotic normality of Ep^^[(7(s)] or Ep^^[(7(s)], the function 

g{9) to be considered is g{9) = X]^=o ^j- 

We now proceed with the technique. First, we verify that the function- 
adjusted prior distribution is proper. Since the prior distribution on the re- 
gression coefficients is a finite mixture of normals. 
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J{l+g^{9))Tr{e)d9<oo. (6.3) 

Next, we consider Theorem 3.1 as applied to the function-adjusted prior dis- 
tribution with r = 2. If conditions (a), (6) and (c) of the theorem hold for a 
particular case-deletion, then the casc-dclctcd weight function have finite sec- 
ond moment, or cquivalently, J w'^j{9, (t'^){1 + g'^{9))p{6,(j'^\y) d9 da^ < oo, 
establishing conditions (6.1) and (6.2) and hence a CLT for the estimators 
Epyx [<?(s)] and Fip^^[g{s)]. (Finiteness of the previous integral implies finite- 
ness of / Y^'^.Z^ w^j-{9, a'^)9jp{9, a'^\y) d0 da"^ .) We note that conditions (a), {b) 
and (c) involve leverage, residuals from a least squares regression including all 
regressors and number of cases deleted. They do not directly involve the prior 
distribution, beyond the parameters a and (3 of the prior on . 

The impact of the prior distribution's tail behavior on decision rules is dis- 
cussed in [1]. Robustness considerations suggest that it is often wise to use a 
prior distribution with thicker tails than the likelihood. For MCMC algorithms, 
a convenient replacement of the normal distribution is a i-distribution, sec for ex- 
ample [4]. The technique used above can be directly applied and yields the same 
results when the prior distribution for Oj is {\—pj)N{Q, T'^)+pjT{d, 0, cr^), with 
the latter term in the mixture a t-distribution with d > A degrees of freedom, 
center and scale r^. The requirement d > 4 guarantees that condition (6.3) 
holds. 

Example 6.3. The results of a study used to estimate the survival distribution 
for leukemia patients are presented in [10]. The response variable is survival 
time (from diagnosis), and explanatory variables are white blood cell count at 
diagnosis (WBC) and whether "Aucr rods and/or significant granulature of the 
leukemic cells in the bone marrow at diagnosis" were present (AG positive) 
or absent (AG negative). The authors of [10] develop estimates of the survival 
distribution based upon presumed exponential distributions which are allowed 
to depend on the covariates. The authors of [(>] dichotomize the survival times by 
defining a new response which indicates survival past 50 weeks. They analyze the 
data with the frequentist counterpart to the logistic regression model described 
in Section 5, where there are k = 3 covariates: an intercept, WBC and AG. The 
authors of [0] identify one case, a patient with a high WBC count and a survival 
time of more than 50 weeks, as having extremely large influence. They also note 
that altering the model (to predict survival based on log(WBC) and AG) can 
reduce the influence of the case. 

We examine influence under a product of double-exponential prior distribu- 
tions for (3. The distribution has scale parameter 10 in each direction (and hence 
a prior distribution with mean for /3j|(/3j > 0) of 10). Case 15, diagnosed in [G] 
as an influential observation, is easily found to have an infinite variance for its 
case-deleted weight function. The value of the criterion /i(/3, 2, 0) is found to be 
/i((0, — 1, 0)^, 2, — 0.1) = 45.15. This value is well in excess of 0, and indicates 
that the choice of e = 0.1 for the prior distribution has little to do with why 
the case-deleted weight function has infinite variance. On the other hand, the 
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value of the criterion in the positive direction for P2 is less than 0, indicating 
that this tail of the distribution of j32 is well-behaved. No other case results in 
an infinite variance for its case-deleted weight function. 

For case 15 condition (6.2) does not hold and the estimators F,p^^[g{s)] and 

Ep^^[g(s)] are not asymptotically normal. Condition (6.2) holds for all remaining 
observations. Thus we can establish the asymptotic normality of their associ- 
ated estimators by showing that condition (6.1) holds as well. We do this by 
using the bounding strategy described above showing that h{P, 2, e) < 0, for 
all /3 : 1/3"^ 1 1 = 1, implies the existence of an open neighborhood U of such 
that / u'^j(s) cxp(s-^t)p(s|y) ds < 00, for all t ^ U. Hence, it follows that 

/ exp{h{f3, 2, e) -I- 2/3^t} df3 is finite for all t in U, which in turn, arguing as in 
the proof of Theorem 5.1, implies that J 'w'^j{s) exp{s'^t)p{s\y) ds < 00, for all 

teu. 

It is interesting to note that the analysis above is not strictly connected to the 
particular choice of the prior distribution as a product of double exponentials. 
Indeed, in light of Corollary 5.1, if the (proper) prior distribution on j3 is thick- 
tailed with respect to the family of product of double exponential distributions, 
then /i(/3,2,0) < for all /3 : |/3"^|1 = 1 still implies both conditions (6.1) and 
(6.2). This is true, even when the noninformative prior distribution 7r(/3) ~ 1 is 
assigned. Finally, if 7r(/3) is thinner-tailed than any product of double exponen- 
tials, then conditions (6.1) and (6.2) are always satisfied. 

7. Conclusions 

The development of effective computational tools for fitting hierarchical models 
has spurred the growth of Bayesian data analysis. As with its classical counter- 
part, a complete Bayesian data analysis investigates sensitivity of inferences to 
changes in the data set, with particular consideration given to excluding obser- 
vations from the analysis. This exclusion is most often accomplished through the 
use of importance sampling based on case-deleted weight functions. The theoret- 
ical results in Sections 3 through 5 provide conditions under which importance 
sampling estimators of various functionals will follow central limit theorems. 
Further results along these lines may be obtained for other likelihoods (particu- 
larly those in the exponential family) and for other specific model structures (as 
in Section 4). The techniques in Section 6 provide a simple means of verifying the 
conditions of the earlier theorems. We have found that the combination of these 
techniques and the theorems allow us to easily verify (or disprove) asymptotic 
normality of many estimators. 

The results can be used to evaluate computational strategies. In many sit- 
uations, computations can be hastened by sampling from a formal model that 
uses a nicely structured prior distribution-say 7rs(s)-in place of the actual prior 
distribution, 7r(s). This change may be motivated by the speed of programming 
conjugate calculations or by the speed of execution of the algorithm (e.g., see 
[17]) used to fit the model. With the altered model, inference is made through 
use of importance samphng with weights Wp{s) = n{s)/TTs{s). When concerned 
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about the effects of groups of cases, these importance samphng weights can be 
combined with the case-deletion weights to produce inference under the case- 
deleted posterior distribution. The weights are w{s) = Wp{s)w\x{s)- Suppose 
that the weights due to the prior distribution have rp moments and the case- 
deletion weights have rj moments (under the model with prior distribution tTj). 
Then a straightforward calculation shows that the combined weights have at 
least (rp ^ + r^^)~^ moments. Thus, the suitability for quick and efficient data 
analysis based on the computational strategy where tt is replaced by tt^ for the 
sampling algorithm can be evaluated. 

There is a strong connection between the tail of the prior distribution relative 
to the likelihood and the robustness of inference based on the model. Sentiment 
generally favors prior distributions with thicker tails than the likelihood. With 
a thick-tailed prior distribution, when there is a clash between likelihood and 
prior, inference is dominated by the likelihood (e.g., see [2], Chapter 4). Our 
preference is to select a prior distribution that reflects the analyst's beliefs. Of- 
ten, this will be a thick-tailed prior distribution, leading to simplified conditions 
such as those in Corollaries 3.1 or 5.1. While our preference is to select the prior 
distribution on the basis of modeling considerations, we do note that the results 
of this paper could be used to select a prior with tails thin enough to guarantee 
existence of some targeted r moments. 

The results we derive apply to broad classes of models. As an example, the 
specification of the normal theory linear model in (3.1) and (3.2) can mask a 
much richer hierarchical model. The richer model may include further parameters- 
say 7-where the prior distribution on 6 depends on 7. As long as the likelihood 
is a function only of 6 and cr^, the case-deleted weight function will also be a 
function of these parameters. The theorems are applied with the marginal prior 
distribution of 9 and a^. The prior specifications in [11] and [19] may be viewed 
in this light. 

Models which combine different studies provide a less evident match for these 
theorems. A typical linear model used for such combination will allow the re- 
gression coefficients to vary from study to study. Such variation is captured 
with a hierarchical model that links the coefficients across studies by means 
of hyperparameters. The overall model can be expressed in graphical form as 
a hierarchical model. The advantage of the general conditions in the theorems 
that describe only the tail behavior of the prior distribution becomes apparent 
in this setting. For case deletions involving only one study, and referring to the 
notation of the previous paragraph, 7 includes the parameters specific to the 
other studies, the data specific to the other studies, and the hyperparameters. 
Thus the marginal prior distribution on 9 and to be used in the theorems is 
the marginal distribution on these parameters, posterior to the data from the 
other studies. While this distribution is usually inaccessible in closed form, one 
can often verify that its tails behave like some (unspecified) normal distribution 
or that they are thicker than the class of normal distributions. This is sufficient 
for application of the theoretical results. 
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APPENDIX 

Proof of Lemma 2. 1 

To prove the first part of the lemma, note that 

5i=Cj^^y f{x)-Ki{x)h{x)dx>c^^j f{x)B~^TrQ{x)h{x)dx 

— B^^cqSq = oo. 
To prove the second part of the lemma, note that 



Si — Ci J f{x)iri{x)h{x) dx < J f{x)b^ Tro(x)h{x)dx 

= Ci^b^^coSo < oo. 
The third part of the lemma follows from the first two parts. 

The proof of Theorem 3.1 relies on the following two lemmas. 

Lemma A.l. Let Ai < • • • < A/ denote the eigenvalues of Hj. The matrix 
(I — r Hj) is non-singular if and only if Xi ^ 1/r, for every i = 1, . . . ,/. // 
(I — r Hx) is non-singular, then it is positive definite if and only if Xi < 1/r. 

Proof Because for all / g 7^andforaUr > 0, [I-r Hj-H] = -r[Hx-{l-l)/rI], 
then the / eigenvalues of (I— r Hj) are 1— rAi > • • • > 1 — rXi and the statements 
in the lemma follow directly. □ 

Lemma A. 2. (i) {X'^X — rXxXj ) is singular if and only if (I — r Hj) is 
singular. 

(ii) (X'^X — rXjXj) is positive definite if and only if (I — r Hx) is positive 
definite. 

Proof. To prove the lemma we use a formula for matrix inversion given in [14]. 
For every square matrix W and any conforming rectangular matrices U and V , 
assuming that each of the stated inverses exists: 

{w + u'^vy^ = w-^ - w-^u'^ii + vw-^u'^y^vw~\ (a.i) 

By applying formula (A.l) to the matrices W = X'^X, U = —rXj and V = Xj, 
an expression for the inverse of {X^X — rXjXj ), when it exists, is given by: 

{X^X - rAiXj)-! = 

^ {X^Xy^ +r{X^X)-^Xx{l-rHi)-^X^{X^X)-\ (A.2) 

On the other hand, if we substitute W ^ I, U = -r{X'^ Xy^Xi and V =^ Xj 
into Equation (A.l), an expression for (I — r Hx)^^ is given by 
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(I - rHi)-^ = I + rX^iX^X - rXxX];)-'Xx. (A.3) 

Thus, we can use formula (A. 2) to verify the "if" part of proposition (i) and 
formula (A.3) to verify the "only if" part. With regard to proposition (ii) of the 
lemma, observe that if (I — r Hj)~^ is positive definite, then XjiJ. — r Hx)~^Xx 
is positive semi-definite and Equation (A. 2) shows that {X"^ X — rXxXj)~^ can 
be written as the sum of a positive definite matrix, [X'^ X)^^ , and a positive 
semi-definite matrix. As such, it is positive definite and {X'^X — rXxXj) must 
be positive definite as well. Looking at Equation (A.3) and arguing in a similar 
manner, the necessary condition in proposition (ii) may be proved. □ 

Proof of Theorem 3.1 

Part (i) The assumption that 7^ f /r for alH = f , . . . , / implies that 

e = (X^X - rXxX^)-' (X^y - rXxV\x) (A.4) 

is well defined in view of formula (A. 2) and Lemma A.f, and the posterior rth 
moment of u'yi(s), E{wyj-{s)\y)^ is proportional to 

j W{x{s)q{s) ds = I (^2)-(„-./)/2-a-l^ 

X exp{-f/(2a2)[y^y - ry^y^ - f {X^ X - rXxX^)~e + 2//3]}x 

X cxp{-f/(2cr2)(0 _ 0)T(^x^x - rXxX^){e - 0)}7ri(6») dO da^. (A.5) 

If condition (a) holds, then, by Lemma A. 2, {X'^ X — rXxX^) is positive definite, 
and 

E{w[j-{s)\y) < const x J (^a^yi«~ri)/2~a~i ^ 

X exp{-I/(2a2)[y^y - ry^yx - o'^iX^X - rXxX^)e + 2/(3]} da^. (A.6) 

Using the expression for {X'^X — rXxXj)~^ given in Equation (A. 2) and the 
property that Hx commutes with (I — Hx), we obtain: 

y^y-ry^yx - o'^ix^x - rXxX^)~e 

= y^(I -H)y-r y^[I + r Hx + Hx{I -rHx)-'Hx]yx 
+ 2ry^[I + rHx{I-rHx)-^]X^{X^X)-^X^y 
- ry^X{X^X)-^Xx{l - r Hx)'^ X^ {X'^ Xy'^ X^ y 
= y'^il -H)y-r ej(l - rHx)-^ex = ^^^\x{r). 

Thus, the integrand in Equation (A.6) is proportional to an inverse gamma 
density if conditions (6) and (c) hold. Sufficiency of conditions (a) — (c) is 
proved. Suppose now that any of conditions (a') or [h') or (c') holds. If {b') 
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is true then, as cr^ oo, the integrand in (A. 5) goes to zero too slowly 
and it is not integrable. On the other hand, if (c') holds, because quadratic 
forms are continuous and because tti has full support, then there exists a 
neighborhood Ci of 6 having positive Lebesgue measure such that RSS*j(r) + 

2/f3 + {e - e)^{X^X - rXxX^){e -e)<0. Also, when (a') holds, because 
— rXjAj ) is non-positive-definite, non-singular matrix, we can find a set 
C2, depending on /3 and RSS*j(r), with positive Lebesgue measure, such that 

RSS^i(r) + 2/l3 +{9- e)^{X^X - rXxXj)(6> - ^) < 0, for aU 6> e C2. Thus, 
under either of conditions (a') or (c'), the integrand in (A. 5) approaches infinity 
at an exponential rate as cr^ for every 6 belonging to a set with positive 
Lebesgue measure. It follows that i?(w^j(s)|y) = 00. 

Part (ii) If the standard noninformative prior 7r(0,(T^) oc l/cr^ is used, we 
can obtain an expression for J 'w^j{s)q{s) ds by setting a = and T^i{d) oc 1 
and letting (3 tend to infinity in Equation (A. 5). Then, if condition (a) holds, 
we have 

J exp{-l/(2CT^)(0 - ~ef{X'^X - rXjX'^){e - ^)}7ri(0) dO = 

^{2^cj^\X'^X-TXiX^\-'f\ 
where here | • | denotes the determinant of its argument and 

E{w{^{s)\y) cx {2n\X^X - rXiX^\~^f'^ x 

X y"(a2)-("-'-^-^')/2-iexp{-l/(2(72)RSS^x(r)} da^. (A.7) 

The integral on the right-hand side is finite if conditions (6) and (c) (as given in 
the statement of part (n)) hold and sufficiency in part (ii) is shown. The proof 
of the "only if" part proceeds as in part (z). 

Proof of Corollary 3. 1 

Let Ej{w\2{0 T(P)\y) denote the posterior rth moment of the weight function 
when the prior distribution for (0, cr^) is given by 'Kii{9) x 7rj2(o'^), for j = 0, 1. 
If \i ^ 1/r for all i = 1, . . . , /, then, Ej{w'!^j{6, (j'^)\y) is proportional to 

|(^2)-(„-./)/2g^p|_^/(2a2)[RSS^i(r)+ 

+ {9- ef{X^X - rXxX^){e - 0)]}7rii(6»)7rj-2(cr2) dO da^ (A.8) 

where 6 is (well) defined in Equation (A. 4). As shown in the proof of Theo- 
rem 3.1, if A/ < l/r,thenO < exp{-(l/(2CT2)(6>-0)^(X^A-rXiX|')(0-0)} < 
1 so that 
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Ej{w{^j(0,cr'^)\y) < constx 
X y"(f^')"^"^"^^^'exp{-l/(2a2)RSS*^(r)}7rj2(CT') rfCT^, j = 0, 1. (A.9) 

Applying inequality (A.9) with j = 1 and using the assumption that 
RSS\i(r) > 0, we have 

E,{w{^{e,a')\y) < const x j (a2)-(«-'-^)/2^i2(fT') da^ , 

and the latter integral is finite by assumption. To prove the second part of 
the corollary, we first note that Theorem 3.1 implies that if A/ > l/r, then 
Eo{w{^j{0,a^)\y) = oo for any tth and 7ro2 G J-2, whereas, if RSS*j(r) < 0, 
then iJo(w\j(0, cr^)|y) = oo for any ttq-i in having f3 > — 2/RSS*j(r). As we 

noted in the proof of Theorem 3.1, in both cases we can find a subset C of TZ'' 
having positive Lebesgue measure such that, for any 9 G C, EQ{'wI^j{0,a'^)\y) 
is infinite because the integral with respect to does not exist in any neigh- 
borhood of zero. Because tti2 is thick-tailed with respect to then, for ev- 
ery fixed B > 0, there exists a CTq such that 7ri2(cr^) > B7To2{a^) for any 
(T^ < (Tq. Thus, by Lemma 2.1 Eq{w'^j{6, cr^)|y) = 00 for some ttq2 in JF2 implies 
Ei{w'^^{6,a'^)\y) = 00 as weh. 

Proof of Corollary 3.2 

If A/ < l/r, inequahty (A.9) holds for both the prior 7rii(0) x 7ro2(o'^) and 
7rii(0) X 7ri2((7^). Furthermore, if 7ro2(o'^) is a prior distribution in J-2 with 
a> -{n- rI)/2 and with /3 such that RSS*,(2) > -2//3, then, if A/ < l/r, 
/ (cr2)-("-'--f)/2 exp{-l/(2cr2)RSS;^2-(r)}7ro2(cr2) dcr^ is finite. By assumption, for 
any fixed 6 > there exists a ^ > such that 7ri2(cr^) < fo7ro2((T^) for any < i5. 
Next, split the integral on the right hand side in Equation (A.9) into the two 
portions over (0,(5) and [5, 00). By Lemma 2.1, (cr^)-^"-''^)/^ exp{-l/(2cr2) 
RSS^i(r)}7ro2((T2) da^ < 00 implies /(,'^(ct2)-("-'-^)/2 cxp{-l/(2CT2)RSS^:j(r)} 
'''i2(c^) du^ < 00. For the portion over ((5, 00), it is enough to observe that 
j-(^2)-(n-r/)/2g^p|_l/(2a2)RSS^i(r)}7ri2(a2) da^ < constx /~(a2)-("-'-^)/2 

""12 (c^) dcr^ , which is finite by assumption. 

Assume now that ttii{9) is thick-tailed with respect to and that A/ > l/r. It 
follows from A/ > l/r together with A^ 7^ l/r for all i = 1, ...,/— 1 that 
{X'^X — rXxXj)/a'^ is a non-positive-definite, non-singular matrix, Vcr^ > 
0. Thus, there exists a sequence {0°} with 00, as t ^ 00, and a 

vector e = (eq, • ■ • , Cfc-i), with ej > for all = 0,l,...,fc — 1 such that 
limt^oo 1/(20-2) (0j „ eY(X^X - rXiX^){et - e) = -oo, for all sequences 
{6t} such that O'^ — e < 6t < 0" + e. Keeping in mind that tth (0) is thick-tailed 
with respect to Ti , then limt_oo cxp{ - 1/ (2cr2) (0^ _ eY{X^X - rXjX^) [Ot - 
6)}TTii{6t) = 00, for all sequences {dt} such that 6^ — e < Ot < 6^ + e. It 
follows from Equation (A. 8) that £'i(wrj(0, cr^)|y) = 00. 
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Proof of Example 3. 1 

To avoid heavy algebra, wc consider only the case = 6, although the result is 
true for an arbitrary ^o- If ha — 1/2 + then X^X — 2xixf — —2. 

Some algebraic manipulations yield 

/•OO 



/"OO 

Jo 



and for the interior integral with respect to x the following bounds hold: 

/>1 poo 

/ exp{a;/(T^ — x^} dx < cxp{x/a^ — x'^}x~'^^'^dx 
Jo Jo 



<exp{l/a^} J x-^/^dx + J 



exp{x/o-^ - x'^} dx. (A. 10) 



Furthermore, jj^ exp{x / — x^} dx cx exp{(2CT^) for all — oo < a < b < oo. 
This fact and the second inequality in (A. 10) imply that if 7ri2 oc exp(— (cr^)"^ — 
a2) then E{w^,^{e,<j^)\y) < constx (a^)-^''-^')/^ exp{-{a^)-^-a^-{RSS{ j2- 
l)/a^} da^ + const x /o'(tT2)-("-2)/2 exp{-|(a2)-2 - - RSS^,/(2a2)} da^ < 
oo. On the other hand, if TTi2{a^) oc exp(— ((t2)-3/2 _ ^2^^ then the first inequal- 
ity in (A. 10) yields 



2x-(n-2)/2, 



poo 

E{wh{e,a^)\y) > const X {a^y 

Jo 

X exp{-((72)-3/2 _ ^2 _ RSS^,/(2a2) + {2a^)-^} da^ = oo. 



Proof of Theorem 4-1 



To simplify the notation, in this proof we will write A{k) for ^(X, r, k), B{k) 
for k) and C for C(X, r). Simple algebraic manipulations show that 

E{wlj{m,cr'^ , k)\v) oc / wl'j{m.,a^,K)q{m,a'^,K)dmd(T'^dK 



2\-(n-r/)/2- 



'-^exp| — A(K)m2 — 2i?(K)m + C |7r2(K) dm dcr^ 

(A.ll) 



2\-(n-r/)/2- 



X cxp-^ 



2ct2 



>7r2(K) dm dcr^ 



(A.12) 
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Suppose first that conditions (a), (6) and (c) are satisfied. It follows from (a) 

that A{k) > for almost all k, so that cxp ^—A{K)/{2a^) [m — B{k)/ A(k)]^^ is 

proportional to the normal density with mean B{k)/A{k) and variance a'^ /A{k). 
Then, denoting by $ the standard normal cumulative distribution function, 
integral (A. 12) reduces to 



(27r 



< 



,1/2 



(•^2~)-(n-r/-l)/2-l 



exp| 



1 

2^ 



C 



-(ri-r7-l)/2-l 



Ain) 

)]A-^/^{K)Tr2iK) da^ dn 
BHk) 



c - 



A{k 

X A^^f"^ {k)'K2{K') da'^ dn 



(A.13) 



(A.14) 



Under conditions (6) and (c), the integrand in (A.14) is proportional to an 
inverse gamma density for almost all n and integral (A.14) is proportional to 



C 



A{k) 



-{n-rI-l)/2 



A-^^'\K)Tr2{K) dK 



(A.15) 



Moreover, condition (c) implies that [C — B^{k)/A{k)]~''"~'^^~^^^'^ is a 
bounded (continuous) function of n on A/"^ so that if J A~^/'^{K)7r2{K) dn < oo 
then integral (A.15) is finite. Condition (a) implies that X^ieJ "^i / X^iLi "^i 
= limK^oo/(X,K) < 1/r or, equivalently, that J2^^Ic'^ - i'^ - l)EjG2:^i > ^ 
so that, as k tends to infinity, A{k) behaves like 1/k^. Hence, the finiteness of 
J A~^^'^{k)t:2{k) dn is guaranteed by J kit2{i^) dn < oo. Sufficiency of conditions 
(a), (6) and (c) follows. 

Assume now that conditions (a), (6) and (d) hold. Then E{w'!^j{m, k, a'^)\v) 
is still proportional to integral (A.13). We will prove that under conditions (6) 
and {d) the integral is finite. It follows from condition (d) that B{k) < almost 
surely and, for every fixed e > 0, we can find a constant A/i > such that 



1 - $ 



Bi^) 



((72A(k))1/2 



< 



e (aM(K))i/2 
X ^ — , „ , , — X exp 



\Bi^)\ 



Ain) 



Vo-2 < {1 /]\Ii )B^ (k) / A{k) . Therefore, an upper bound for integral (A.13) is 



(1+6) 



exp{-;^}|B(K)|-1^2(A«) do'' dK^ 



c- 



BHk) 



A{^) 



^A-^^'{K)Tr2{K)da' dK 
■=h 



where M{k) := B''{k) /[M^ A{k)]. With regard to integral /i, observe that 
/i < (1 + e) /o°° C^"^ M(«)(a2)-("-'-^)/2-i exp{-C/2a2}|B(At)|-i7r2(K)da2dK. 
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Under conditions (b) and (d), (cr^) '"^^/^ ^ exp{ — l/(2cr^)C} is proportional 

to an inverse gamma density so that Ii < AI2 /q°° AI {k) / \ B {k)\'K2{k) dn 

= M2/M'l Jp°° \B{n)\/ A{Hi)TT2{i^) dn, for some constant M2 > 0. Moreover, as k 

tends to 00, it follows from condition {d) that |-B(k)| behaves like 1/k and, as 

seen earlier, it follows from condition (a) that behaves like Hence, 

we conclude that J KTr{K) < 00 implies J \B{k)\/ A{k)t:2{k) dn < 00. 

With regard to I2, under condition (b) we obtain: 



I2 < 



'Af(..)-("-^-i)/^^ax{l,exp{- ^ g^'^MC^) d.. 



Conditions (a) and (d) together yield sup i^fz B^{k)/A{k) < 00 and 
inf M{k) > and previous integral is finite if J A^^/^{Hi)7r2{K) dn < 00. 
Sufficiency of conditions (a), (b), (d) follows. 

Conversely, if (e) holds, then the integrand in integral (A.ff) approaches 
infinity at an exponential rate as goes to zero, whereas if n — rl < 0, the 
integrand approaches zero too slowly as goes to infinity. Both (e) and n—rl < 
imply that integral (A. 11) is infinite. Actually, non intcgrability follows even if 
< n~rl < 1. To show this suppose that A{T, r, k)to^— 2i3(X, r, K)m+C{T, r) > 
for almost all k and that n — rl > 0. Thus, integral (A.f 1) is proportional to 

1 -(ri-r/)/2 

A{X, r, k)to^ — 2i3(X, r, Kjm + Cil, r) 112 (k) dm dK, 

but the interior integral with respect to m is infinite if (n — rl) < 1. Thus 
condition (/) implies E(w^j(m, K,a'^)\v) = 00. 

The proof of Theorem 5.f relies on the following lemma which relates a bound 
in terms of polar coordinates to the finiteness of the integral. 

Lemma A. 3. Suppose that f{f3) is continuous in f3, f3 G TZ^ , and that, for 
some M < 00 and b < 0, |/(^)| < exp(6||/3||) for all P such that \\f3\\ > M. 
Then J^, |/(/3)|d/3 < c». 

Proof. Split the integral into two portions. For f3 such that ||/3|| < M, we have 
the integral of a continuous function over a compact set. This integral is finite. 
The integral over the remaining portion of the space is also finite: 

/ |/(/3)M/3< / exp(H|/3||)d/3= / Ckr'^^^ exp{br) dr < ^, 

J\\l3\\>M J\\f3\\>M JM 

where Ckr^~^ is the surface area of the k dimensional sphere of radius r. □ 
Proof of Theorem 5.1 



The expected rth moment of the case-deleted weight function can be written as 
an integral against the prior times the likelihood: 
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J w{^{f3)7r{f3)f{y\x,f3)df3^ 

J 4^ cxp{(r - l)(3^x.,y,} ' ' f^^^ 1 + exp{/3^a; J 

In order to apply Lemma A. 3, we consider a ray emanating from the origin 
in an arbitrary direction, specified by a particular f3 under the constraint that 
1/3"^ 1 1 = 1. In this fixed direction, the rate of decay (or increase) of the tail is 
determined by the maximum contribution, either 1 or exp{f3^Xi}, from each 
term of the form 1 + exp{f3'^ Xi} in the products above. Collecting terms, we 
have that the rate of decay is governed by 

exp ^ 0^Xiyi - (j' - 1) ^ 0^Xiy^ - ^ max(0, 0^ x{)+ 

+ =exp(/i(/3,r,e)) 

We consider the expression above, and note that we can obtain an (decreas- 
ing) exponential bound on the tail whenever the term inside the exponential is 
negative. If the corresponding expression is negative for every direction, we can 
construct a uniform bound which satisfies the assumption of the lemma which, 
in turn, allows us to conclude that the rth moment of the case-deleted weight 
function is finite. 

The infinite rth moment case involves a positive value for some direction 
specified by /3. In this event, since /i(/3, r, e) is continuous in /3, we conclude 
that there is a neighborhood of directions in which the integral along a ray is 
infinite. Thus, the integral is infinite, and so is the rth moment of the case- 
deleted weight function. 
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